Discovering Meaningful Clusters from Mining the Software Engineering Literature
نویسندگان
چکیده
Document clustering is becoming an increasingly popular technique for identifying relationships in unstructured text. In this paper, we attempt to make sense of the output of a clustering algorithm applied to software engineering research papers. We introduce a notion of cluster “stability” as a measure of the meaningfulness of a cluster. We assess its usefulness and limitations in identifying meaningful clusters. In the process, we track how important research topics may have changed from year to year.
منابع مشابه
An Integrated DEA and Data Mining Approach for Performance Assessment
This paper presents a data envelopment analysis (DEA) model combined with Bootstrapping to assess performance of one of the Data mining Algorithms. We applied a two-step process for performance productivity analysis of insurance branches within a case study. First, using a DEA model, the study analyzes the productivity of eighteen decision-making units (DMUs). Using a Malmquist index, DEA deter...
متن کاملA Survey on Approaches for Mining Frequent Itemsets
Data mining is gaining importance due to huge amount of data available. Retrieving information from the warehouse is not only tedious but also difficult in some cases. The most important usage of data mining is customer segmentation in marketing, shopping cart analyzes, management of customer relationship, campaign management, Web usage mining, text mining, player tracking and so on. In data mi...
متن کاملA review of text mining approaches and their function in discovering and extracting a topic
Background and aim: Four text mining methods are examined and focused on understanding and identifying their properties and limitations in subject discovery. Methodology: The study is an analytical review of the literature of text mining and topic modeling. Findings: LSA could be used to classify specific and unique topics in documents that address only a single topic. The other three text min...
متن کاملFast Mining of Temporal Data Clustering
Temporal data clustering provides underpinning techniques for discovering the intrinsic structure and condensing information over temporal data. In this paper, we present a temporal data clustering framework via a weighted clustering produced by initial clustering analysis on different temporal data representations. In the existing system a novel weighted function guided by clustering validatio...
متن کاملAnalysis of Pre-processing and Post-processing Methods and Using Data Mining to Diagnose Heart Diseases
Today, a great deal of data is generated in the medical field. Acquiring useful knowledge from this raw data requires data processing and detection of meaningful patterns and this objective can be achieved through data mining. Using data mining to diagnose and prognose heart diseases has become one of the areas of interest for researchers in recent years. In this study, the literature on the ap...
متن کامل